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ABSTRACT 

21 cm tomography is expected to be difficult in part because of serious foreground 
contamination. Previous studies have found that line-of-sight approaches are capable 
of cleaning foregrounds to an acceptable level on large spatial scales, but not on small 
spatial scales. In this paper, we introduce a Fourier-space formalism for describing 
the line-of-sight methods, and use it to introduce an improved new method for 21 cm 
foreground cleaning. Heuristically, this method involves fitting foregrounds in Fourier 
space using weighted polynomial fits, with each pixel weighted according to its in- 
formation content. We show that the new method reproduces the old one on large 
angular scales, and gives marked improvements on small scales at essentially no extra 
computational cost. 
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1 INTRODUCTION 

Neutral hydrogen tomography is emerging as a promising 
new probe of the epoch of reionization and cosmology. By 
taking advantage of the 21 cm hyperfine transition, neutral 
hydrogen tomography in principle allows one to map the dis- 
tribution of hydrogen over a large range of redshifts, some of 
which are accessible through no other observational probes. 
For example, neutral hydrogen tomography can potentially 
provide the only measure of the universe's expansion his- 
tory, thermal history, as well as its clustering growth dur- 
ing the so-called dark ages. Furthermore, the dramatic in- 
crease in the volume that can be mapped by the technique 
could enable precision tests of inflation, including stronger 
constraints on the spectral index of inflationary seed fluc- 
tuations, the running of the index, and small-scale non- 



Gaussianity (McQuinn et al. |2006 


Santos & Cooray||2006 


Wyithe et al.||2007 Bowman et al. 


2007 Mao et al.||2008|. 



Neutral hydrogen tomography has also been predicted to 
be a sensitive probe of other parameters such as neutrino 
masses and the dark energy equation of state, either through 
power spectrum measurements ( |Mao et al.||2008[) or other 



probes such as 21cm lensing tomography (Zahn & Zaldar- 
riaga|2006| |Metcalf fc White|2009| |Benton Metcalf|2009[ )~ 

Despite its promise, a number of challenges must be 
overcome before neutral hydrogen tomography becomes a 
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Figure 1. 2D power spectra of foregrounds and foreground resid- 
uals using the "old method" jBowman et al.|2009||Liu e t al. 2009) 
and the "new method" (this paper). At low-fc the two methods 
give identical results, while at high-fc the new method does much 
better. Sudden spikes in the foreground residuals occur only with 
the old method. 



reality. One serious problem is the issue of foreground con- 
tamination. A variety of astrophysical sources, including un- 
resolved extragalactic points sources, resolved point sources, 
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and galactic synchrotron radiation, will contribute contami- 
nants with brightness temperature on the order of hundreds 
of Kelvins. This will dominate the cosmological signal (which 
is expected to be on the order of mK), and so robust fore- 
ground subtraction techniques will be essential. 

Previous studies have examined the feasibility of fore- 
ground subtraction in neutral hydrogen tomography, and 
have generally found that variations of the line-of-sight ap- 



proach pioneered by Zaldarriaga et al 
(20061), and IMcQuinn et al 



(2004), Wang et al. 



( 2006 1 may be able to clean 



out foreground contamination to an acceptable degree, al- 
though instrumental effects such as noise may compromise 
the quality of the cleaned maps. Wang et al.|(|2006"|);|Bowman 



et al. 


(2009); Jelic et al. 


(2008 


; Gleser et al.| (|2008|); |Harker 


et al. 


(20091; Liu et al. 


(2009 


I performed simulations that 



included fiducial models of these effects, and found reason- 
ably encouraging results. It should also be noted that many 
of these instrumental effects, though serious, represent prob- 
lems that are decoupled from the foreground subtraction 
challenge. As discussed in Liu et al. (20091, any linear sub- 



traction algorithm leaves the noise contribution to the power 
spectra unaffected, and so noise bias removal can be dealt 
with separately. For instance, instrumental noise bias can be 
removed from power spectra by cross-correlating maps made 
from data taken at different times. Whether the results are 
ultimately acceptable for Epoch of Reionization science will 
be difficult to answer until experimental data is obtained. 

In any case, it is important to consider a wide vari- 
ety of possible foreground subtraction algorithms, and in 
this paper we propose a new variation on the traditional 
line-of-sight methods. Specifically, we describe a cleaning 
algorithm that (unlike moslQ proposals) is implemented in 
Fourier space. As we discuss in Section [3j this allows one to 
completely sidestep any problems that may arise from the 
frequency dependence of an instrument's beam, which was 
previously the limiting factor in the quality of foreground 



cleaning at high-wavenumber spatial Fourier modes (Bow- 
|man et al. 2009 Liu et al. 20091. The increase in perfor- 



mance at such wavenumbers can be easily seen in figure 
[l] where we have taken simulated data from a single fre- 
quency slice (y = 158.73 MHz, corresponding to a 21-cm 
signal coming from 2 = 8) and plotted [k 2 P2D(k)] 1 ^ 2 , where 
P2D{k) refers to the two-dimensional spatial power spec- 
trum. The quantity [k 2 P2D(k)] 1 ' 2 can be thought of as the 
fluctuation level as a function of scale, and is exactly anal- 
ogous to 8 T /T(x [£ 2 C e ] 1/2 in cosmic microwave background 
experiments. 

The rest of the paper is organized as follows. In Sec- 
tion [2] we review the old method used in IBowman et al.l 
( 2009[ ); |Liu et al.| ( |2009) , and in Section [T 



we recast it as 



an algorithm in Fourier-space. The Fourier-space description 
is then used to introduce our new method in Section [3] We 
conclude in Section 0] 



2 REVIEW OF OLD METHOD 

In general, the data collected from a typical 21-cm tomog- 
raphy experiment can be thought of as populating a "data 



cube" : stacks of 2D images separated by redshift or fre- 
quency. Along the transverse directions, the axes are usually 
labeled in one of three ways: 

(i) Real-space coordinates 6 X and 8 y . In this case the data 
cube is a literal map of 21-cm emission and foreground con- 
taminants. 

(ii) Interferometer coordinates u and v. Under the correct 
convention, these are simply the Fourier-conjugate coordi- 
nates to 6 X and 8 y . The data cube is a stack of 2D maps in 
Fourier space. 

(iii) Fourier-space coordinates k x and k y . These are the 
Fourier-conjugate coordinates to the physical lengths x and 
y. Up to factors of 2-n (depending on one's Fourier conven- 
tion), (k x ,k y ) ~ (u,v)/Dm, where Dm is the transverse 
comoving distance. 

In a typical experiment the data (in the form of visibilities) 
are collected in ^-coordinates, while the results are pre- 
sented in either real-space coordinates (in the case of sky 
maps) or in Fourier-space coordinates (in the case of power 
spectra). Foreground removal is often done in either real- 



space coordinates (as demonstrated in Wang et al.| 2006 



Bowman et al. (2009); Jelic et al. (20081; Liu et al 



or in ww-space (as done in |Zaldarriaga et al.| ( |2004[ ) 



2009) 



Gleserl 



1 Zaldarriaga et al.| ( |2004"l l and |GIeser et al.| | |2008| l are exceptions 
and consider algorithms for Fourier space subtraction. 



et al. (2008), and as we propose in this paper). 

We first review the real-space removal algorithms. The 
fundamental idea behind all such algorithms is the fact that 
the 21-cm signal is expected to oscillate rapidly with fre- 
quency while the relevant foreground contaminants are spec- 
trally smooth. The contaminants along a given line-of-sight 
can therefore be separated from the signal by plotting the 
flux as a frequency and subtracting off a smooth component 
(such as a low-order polynomial) from the total signal. What 
remains is the cosmological signal and a (hopefully small) 
residual contamination. 

Previous studies have simulated the aforementioned 
real-space algorithms and have estimated the level of resid- 
ual contamination that can be expected for current experi- 
ments ( Jelic et al.|20 08 Bow man et al.|2009 1 as well as how 
the residuals depend on the properties of a generic interfer- 
ometer ( |Liu et al. 2009). Although these papers have high- 
lighted the fact that the quality of foreground subtraction 
is highly dependent on a large number of parameters (both 
instrumental and those pertaining to data analysis), they 
also suggest that the qualitative behavior is rather generic. 
In what follows we examine the qualitative behavior that 
emerges, emphasizing the various features and their mathe- 
matical origin. 

Consider the spectra shown in figure [2] The sum of the 
two black curves show the frequency dependence of a sin- 
gle pixel in real-space coordinates (i.e. the frequency depen- 
dence of a particular line-of-sight), as seen by a typical 21 cm 
tomography interferometeij^] This total spectrum (formed 
from the sum of the two black curves) contains foregrounds 



2 We use the Murchison Widefield Array as our fiducial model for 
the simulations in this paper (see |Liu et al,||2009| l for details), but 
it should be noted that the algorithm we propose in Section[3]can 
be applied to data collected by any interferometric configurations. 



An Improved Method for 21cm Foreground Removal 3 



l.b 



1 - 



0.5 



-1 - 













— «— Jagged foreground contribution 




— i — Smooth foreground contribution 




- - Old method fit 




New method fit 



1 1§8.5 159 159.5 160 160.5 161 161.5 

Frequency in MHz 

Figure 2. The spectrum of a typical real-space pixel as seen by an 
interferometer. The instrument introduces a jerky dependence on 
frequency even though the foregrounds are intrinsically smooth. 
The total signal is the sum of a smooth component (black curve 
with "x" markers) coming from the central parts of the uu-plane 
and a jagged component (black curve with "+" markers) from the 
outer parts of the plane. The blue/dot-dashed line gives the fore- 
ground fit using the old method, while the red/dashed line gives 
the analogous real space "fit" using the new method (see Sec- 
tion [3] for details). The means of each curve have been artificially 
removed for clarity. 



Real space (beam) Fourier space (distribution of baselines) 




Figure 3. The left hand column shows sample beam profiles (a 
real-space description of the beam) while the right hand column 
shows the corresponding MD-distribution of baselines (a Fourier- 
space description of the beam) . The top row illustrates an array 
with no rotation synthesis, while the bottom row shows an ar- 
ray with 6 hours of rotation synthesis. The real-space beams are 
normalized so that their peaks are at 1. 



only, with no noise]^] or cosmological signal. Since the fore- 
grounds are known (and are simulatecQ to be spectrally 
smooth, this suggests that the rapid oscillations seen in the 
figure are caused by the instrument. This is bad news for the 
subtraction algorithm, as it means that simply fitting out 
the smooth component of a spectrum will leave residuals 
that can be confused with the cosmological signal. Indeed, 
it can be seen from the figure that the fit seems rather poor. 

One way of understanding the rapid oscillations is to 
consider the interferometer's beam in real-space. The left- 
hand panel of figure [3] shows that the beam of a typical 
interferometer contains "frizz" outside the central peak that 
oscillates rapidly with angle. Since beam widths scale as 
X/D, this angular oscillation translates into an oscillation in 
frequency, which is what is seen in figure [2] Alternatively, 
the behavior of figure [2] can be understood by considering 
the effect of an interferometer's beam in iro-space. An inter- 
ferometer samples pixels in the Mf-plane, and with enough 
of these uw-pixels one can produce a real space image by 
Fourier transforming. Thus, the spectrum of a single pixel 
in real space can be thought of as a linear combination of the 
spectra of different uv pixels sampled by the interferometer. 

3 In our simulations, we neglect instrumental noise. This repre- 
sents no loss of generality because our subtraction algorithms are 



Liu et al. 



(2009) for details). 



linear (please see Section II or 

4 The simulation methodology used in this paper was the same 
as that used in |Liu et al. l|2009|), where point sources were inde- 



pendently generated in each pixel from source count distributions 
given in |Di Matteo et al.| ( |2002} . Please see |Liu et aT] l |2009| ) for 
details. 



Exactly which pixels are sampled depends on the layout of 
the interferometer in question, but in a typical 21-cm to- 
mography experiment the uv coverage is complete near the 
origin and drops off as one moves farther out. 

In general, the foreground spectrum seen by an instru- 
ment can be considered the sum of two components: a com- 
ponent that is formed from a linear combination of wu-pixels 
where the interferometer's coverage is complete (i.e. the in- 
ner parts of the w-plane), and a component that is formed 
from w-pixels residing in parts of the uu-plane where cov- 
erage is sparse (i.e. the outer regions). These components 
are shown using solid black lines in figure [2] The line with 
"x" markers (showing the part of the signal originating from 
the inner parts of the plane) is seen to be smooth, whereas 
the line with "+" markers (showing the contribution from 
the outer parts) is what contributes the rapid oscillations. 
(Note that this curve appears to have zero temperature only 
because we have artificially removed the mean of each curve 
for graphical clarity). This decomposition explains why real- 
space pixel-by-pixel foreground subtraction algorithms have 
been shown to be adequate even though the fits themselves 
seem terrible at first sight. Even though the smooth fits can- 
not subtract off the jerky component of the spectrum, they 
are capable of fitting out the smooth component that comes 
from the central parts of the Fourier plane. Indeed, this is 
exactly what is seen in figure [T] where the low-fe parts of the 
power spectrum are cleaned effectively whereas the high-fc 
parts remain contaminated. It is simply the case that by ex- 
amining pixels in real space, one is viewing a "bad" linear 
combination of pixels that mixes together the well-fit, cen- 
tral located Mil-pixels with the outer wu-pixels where sparse 
baseline coverage results in jerky spectra that are badly fit. 
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2.1 Fourier space description of decontamination 

In the previous section, we examined how foreground fits of 
real-space pixels could be understood by considering the flux 
in each pixel as being a linear combination of different uv- 
pixels. We now show that one can go further and perform the 
fits themselves in ttti-space and get exactly the same results. 
With slight modifications, this will lead to the discussion in 
Section|3]of an improved method for subtracting foregrounds 
at high k. 

Consider the steps that must be taken to perform the 
foreground subtraction outlined above. The data is collected 
by the interferometer in Fourier space i.e. in a (u, v, v) data 
cube. This data must then be Fourier transformed in the two 
transverse directions, giving an x-y-v data cube. Fitting is 
subsequently performed in the frequency direction. Mathe- 
matically, we can express this as follows. Let jjij a represent 
the initial data cube, with the first two (Latin) indices be- 
ing the two spatial indices and the last (Greek) index being 
the frequency index. With no loss of generality, we can fold 
the first two indices into one and write yj a instead. In this 
notation, the Fourier transform can be written as 



Ilk, 



(i) 



where F is the Fourier matrix and y is the real-space analog 
of y. The fit in the frequency direction can be represented 
by yet another linear operatoi[^] G, and so we have 



(2) 



where y represents the fit. In the last expression, note that 
G possesses only Greek indices whereas F only has Latin 
indices. This means that the two operations performed in 
our algorithm - the 2D spatial Fourier transform (F) and 
the fitting in the frequency direction (G) - in fact commute, 
i.e., FG = GF. 

The fact that the Fourier transform commutes with the 
fitting means that we can perform the two operations in ei- 
ther order. In other words, we can think of the foreground 
fitting and subtraction as taking place in Fourier space with- 
out changing any of the results (which is something that we 
have also verified numerically). Viewing the process as a 
pixel-by-pixel fitting in uu-space reveals exactly why there 
exists such a vast difference between the quality of the clean- 
ing at low-fc and at high-fc, and why the transition between 
the two regimes appears as such an abrupt jump in the power 
spectra. In figure [4] we show typical spectra from different 
parts of the uu-plane. The top panel shows a typical pixel 
from the inner part of the plane. The spectrum is plotted 
using so-called uniform weighting, so that in every Fourier 
pixel the interferometer acts as an on/off switch: the inter- 
ferometer imposes a weighting of to a pixel if no baselines 
fall in that pixel, and a weighting of 1 otherwise (regard- 
less of how many baselines are binned into that pixel) . It is 
evident that a simple polynomial fit does extremely well. 



5 Explicitly, for the case where one fits a polynomial of degree 
m, one has G = X[X t N~ 1 X] - 1 X t N" 1 , where N is the noise 
covariance matrix and X is an n X (m + 1) matrix such that Xij 
equals the frequency of the ith frequency channel taken to the 
jth power | |Wang et al.|2006 l. 
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Figure 4. Spectra of various uv pixels from different parts of the 
plane. From the top panel to the bottom panel, one is moving 
away from the origin. It is clear that the data can be easily fit by 
low-order polynomials in the top panel, but that the old method 
of fitting (dashed red curves) becomes inadequate when baseline 
coverage begins to drop out. The solid black curves show the fits 
done using the new method describe in Section [3] 



On the other hand, when one moves out to regions of 
the iro-plane where baseline coverage becomes sparse, the fit 
becomes poor. A glance at the bottom two panels of figure 
[4] makes the problem clear - when coverage is sparse, at cer- 
tain frequencies there is no baseline coverage, and a simple 
polynomial fit is unable to deal with this. We emphasize that 
the trouble is not with incomplete Fourier coverage per se. 
It is the fact that the incomplete coverage is changing with 
frequency. In other words, foreground subtraction becomes 
poor in this regime because the frequency dependence of the 



beam (or "mode-mixing" , as emphasized in Bowman et al. 
( |2009[ ) ; Liu et al. (2009)) becomes important on these small 
(high-fc) scales. Note that even though this problem exists 
when the spectra are being fit in real space, it is not appar- 
ent unless one fits in uu-space, where the pixels are "good" 
linear combinations of the data. 



3 NEW METHOD 

We now propose a slight modification to the foreground sub- 
traction algorithm that evades the aforementioned problem. 
From figure [4] one can see that an alternate way of phrasing 
the problem is to say that the old fitting algorithm, being 
mathematically equivalent to a fitting in real space, is unable 
to distinguish between pixels with no data and pixels with 
values that happen to be zero. In uu-space, however, one 
can easily identify pixels with no baselines, and so one can 
simply skip frequencies where data is unavailable. In fact, 
one can find the optimal fit (in the sense of having minimal 
r.m.s. errors) by employing an inverse-variance weighted fit. 
In this scheme, the weight of each point in the least-squares 
sum is proportional to N, the number of baselines that are 
binned into a particular ttu-pixel at a particular frequency. 
This way, points with lower signal-to-noise are given less 
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Figure 5. Post-subtraction residuals shown in the uti-plane at v = 157 MHz (left column) and as a function of frequency, taking a cut 
through the center of each Jiti-plane (right column). This is done for the old method (top row) as well as for the new method (bottom 
row). The new method does not offer any increase in performance at low-fc, but avoids the large increase in residuals at high-fc. 



weight, and points with no data at all are given zero weight 


In figure [4] it can be seen that since missing frequencies 
are now given zero weight in the fit, one obtains excellent 
fits even for ui;-pixels where baseline coverage is sparse. This 
improves the subtraction of foregrounds at frequencies where 
there is data, whereas at the skipped frequencies nothing 
has been compromised since no foregrounds were detected 
by the instrument in the first place. 

The effect that the frequency-skipping has on the 2D 
power spectrum is shown in figure [T] To be conservative, 
we have also tested our new algorithm using a completely 
independent pipeline with a different foreground model (for 

8 It is important to emphasize that in this section, we use the 
term "weight" to refer to the statistical weight that we give to a 
data point in the fit. We are not pre-multiplying the data with a 
weighting function. In other words, while our fits assign different 
statistical weights to each data point, the data points themselves 
are not tampered with ahead of time and are simply "uniformly 
pre- weighted" as described in Section [2T| 



details, please see |Bowman et alT| ( |2009] |). The results from 
the second pipeline are shown in figure [5] and the fact 
that the results agree demonstrate the fact that uu-plane 
cleaning is generally applicable and not dependent on the 
foreground model. Qualitatively, one can see that at low- 
fc there is no improvement from the old method because 
in that regime one is limited by the fact that simple low- 
order polynomials will not in general be perfect fits to the 
foregrounds, even though the foregrounds are smooth func- 
tions. At high-fc, however, one avoids the dramatic increase 
in post-subtraction foreground residuals, because previously 
the limitation at high-fc was the mode-mixing problem. With 
our new method, the limiting factor is the ability of the fit- 
ting function to match the form of the foregrounds. For ex- 
ample, the fact that the foreground residuals in figure [I] are 
a constant factor (~ 10 6 ) off from the original foregrounds 
regardless of scale (or equivalently, regardless of location on 
the Mu-plane) means that the residuals are due entirely to 
the quality of the fit. In other words, the residuals of one part 
in ~ 10 6 come from the fact that the second-order polyno- 
mials used in the fits to produce figure [I] are good fits to 
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the foregrounds only to one part in ~ 10 . With the chief 
limitation now being the fitting itself, one can in principle 
subtract foregrounds up to Fourier modes that correspond 
to the longest baselines, although as one is forced to skip 
many frequencies at high k, the signal-to-noise of the data 
is reduced. 

As a weighting scheme that weights data points accord- 
ing to their information content, inverse-variance weight- 
ing not only gives higher signal-to-noise data points greater 
weight, but also automatically incorporates frequency- 
skipping, since the skipped frequencies are simply those with 
N — and therefore no information. While both effects con- 
tribute to better foreground subtraction, we find frequency- 
skipping to be the dominant cause of this increase in perfor- 
mance. 

It is important to note that whereas without the skip- 
ping of empty frequencies the transverse Fourier transform 
commuted with the fitting of the foregrounds, under the 
new scheme proposed here the two operations no longer 
commute. This is because the frequencies of the pixels that 
need to be skipped require knowing the baseline distribu- 
tion (which lives in tw-space) and therefore depends on the 
location of the uw-pixel being cleaned. Mathematically, this 
means that in equation[2] the fitting operator G acquires an 
extra i (spatial) index and the two sums no longer commute. 
The significance of this is that the fit can no longer be done 
in real-space. To apply this new algorithm for foreground 
subtraction, one must work in Fourier space. 

However, while the skipping of frequencies in our fit dic- 
tates that we must work in Fourier space, the improvements 
brought about by the new algorithm can still be seen in 
real space. Consider the dashed (red) fit in figure [2] This fit 
was obtained by taking the uu-space fits generated by the 
new algorithm and Fourier transforming real space to give 
a real space "fit". It is clear from the figure that the new 
method does a much better job of tracking the behavior of 
the smooth foreground component. On the other hand, the 
slope of the fit from the old method is biased by the jagged 
foreground contribution (which, remember, is an instrumen- 
tal artifact that arises from incomplete baseline coverage), 
and does a worse job tracking the smooth foregrounds. 

The fact that our new method traces the smooth fore- 
ground component better means that it can be used to get 
better estimates of the foregrounds themselves. One sim- 
ply Fourier transforms the fits produced by the new algo- 
rithm to get real-space, multi-frequency maps of the fore- 
grounds. Such maps will be of a higher quality than those 
that are simply imaged by the instruments. This is because 
our new fitting algorithm can be interpreted as one where 
the missing frequencies are not so much skipped as inter- 
polated over. By fitting low-order polynomials over the fre- 
quencies where data is available, one is essentially deriving 
a foreground model that can be extrapolated to other fre- 
quencies. Without missing frequencies in the spectra, the 
real space foreground maps will not have artificially jagged 
foreground components, and will therefore be a more accu- 
rate representation of the true foregrounds. 



4 CONCLUSIONS 

In this paper, we have shown that there is an easy expla- 
nation for the increased foreground residuals at high-fc: a 
frequency-dependent incompleteness of baseline coverage in 
the outer parts of the uu-plane makes the foregrounds in cer- 
tain ify-pixels difficult to fit out using a simple unweighted 
polynomial fit. The solution to this problem is to weight the 
fit so that frequencies with no information are given zero 
weight, while other frequencies are given an inverse- variance 
weighting. As seen in figure [l] this allows foreground clean- 
ing to be performed at much higher k, paving the way for 
higher quality power spectrum measurements in neutral hy- 
drogen tomography. 
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